Overview and Culture of the Different Subfields of Multimedia
THIS PAGE IS NOW OBSOLETE
The foundations of audio, speech, image, and video analytics are rooted both in the signal processing community, which is part of electrical engineering, as well as the field of statistics, which is part of mathematics. Many terms that are still used have been inherited by these two fields. A newer field, which originated with the advent of the computer, is called “machine learning”; it is actually a subfield of statistics, paired with biology.
The field of machine learning redefines many words originally created in statistics and biology. The application of machine learning and the foundations of signal processing to different kinds of data created the different fields of audio, speech, image, and video processing. Each of them in turn created new terms.
There are a variety of reasons for those fields to have become separated; some are social and very pragmatic. A very important reason is the amount of data that has to be processed. Audio and speech processing is a the oldest field because computers where already fast enough to process speech in the 60s and 70s. Image analysis is a slightly newer field, and video analysis is the newest because there is much more data to be processed.
With these fields maturing over time, distinct generations of people have worked on the different types of data and used different vocabularies. Today, the processing power of modern computers allows to think about approaches that analyze media files multimodaly, i.e. processing the audio, the images, and the dependencies between the images synergistically.
Combined processing promises improved robustness in many situations and is closer to what humans are doing. The human brain takes into account not only patterns of illumination on the retina or periods of excitation in the cochlea, it also combines different sensory information and benefits from past experience. Humans are able to use context information and to fill in missing data by associating parts of objects with already learned ones.
Overview and Culture of the Different Subfields of Multimedia
THIS PAGE IS NOW OBSOLETE
The foundations of audio, speech, image, and video analytics are rooted both in the signal processing community, which is part of electrical engineering, as well as the field of statistics, which is part of mathematics. Many terms that are still used have been inherited by these two fields. A newer field, which originated with the advent of the computer, is called “machine learning”; it is actually a subfield of statistics, paired with biology.
The field of machine learning redefines many words originally created in statistics and biology. The application of machine learning and the foundations of signal processing to different kinds of data created the different fields of audio, speech, image, and video processing. Each of them in turn created new terms.
There are a variety of reasons for those fields to have become separated; some are social and very pragmatic. A very important reason is the amount of data that has to be processed. Audio and speech processing is a the oldest field because computers where already fast enough to process speech in the 60s and 70s. Image analysis is a slightly newer field, and video analysis is the newest because there is much more data to be processed.
With these fields maturing over time, distinct generations of people have worked on the different types of data and used different vocabularies. Today, the processing power of modern computers allows to think about approaches that analyze media files multimodaly, i.e. processing the audio, the images, and the dependencies between the images synergistically.
Combined processing promises improved robustness in many situations and is closer to what humans are doing. The human brain takes into account not only patterns of illumination on the retina or periods of excitation in the cochlea, it also combines different sensory information and benefits from past experience. Humans are able to use context information and to fill in missing data by associating parts of objects with already learned ones.
The Roots: Machine Learning
Speech processing
Computer Vision
Natural Language Processing
Semantic Technologies
add your own field...